Extracting Information from Indian First Names

نویسنده

  • Akshay Gulati
چکیده

First name of a person can tell important demographic and cultural information about that person. This paper proposes statistical models for extracting vital information that is gender, religion and name validity from Indian first names. Statistical models combine some classical features like ngrams and Levenshtein distance along with some self observed features like vowel score and religion belief. Rigorous evaluation of models has been performed through several machine learning algorithms to compare the accuracy, FMeasure, Kappa Static and RMS error. Experimental results give promising and favorable results which indicate that these models proposed can be directly used in other information extraction systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A General Approach to Extracting Full Names and Abbreviations for Chinese Entities from the Web

Identifying Full names/abbreviations for entities is a challenging problem in many applications, e.g. question answering and information retrieval. In this paper, we propose a general extraction method of extracting full names/abbreviations from Chinese Web corpora. For a given entity, we construct forward and backward query items and commit them to a search engine (e.g. Google), and utilize se...

متن کامل

ONER: Tool for Organization Named Entity Recognition from Affiliation Strings in PubMed Abstracts

Automatically extracting organization names from the affiliation sentences of articles related to biomedicine is of great interest to the pharmaceutical marketing industry, health care funding agencies and public health officials. It will also be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or co...

متن کامل

Standardized American Indians: The "Names of Indian tribes and bands" list from the Office of Indian Affairs

The inconsistent spelling of American Indian tribal names at the end of the nineteenth century led in part to the development within the Office of Indian Affairs of an array of 270 standardized identifiers, ranging from Absaroka to Zuñi. These efforts paralleled the simultaneous improvement of a large suite of relevant terms by the United States Board on Geographic Names. Both compilations were...

متن کامل

ESM-IL: Entity Extraction from Social Media Text for Indian Languages @ FIRE 2015 - An Overview

Entity recognition is a very important sub task of Information extraction and find its applications in information retrieval, machine translation and other higher Natural Language Processing (NLP) applications such as co-reference resolution. Entities are real world elements or objects such as Person names, Organization names, Product names, Location names. Entities are often referred to as Nam...

متن کامل

Effect of a thermal power plant waste fly ash on leguminous and non-leguminous leafy vegetables in extracting maximum benefits from P and K fertilization

Although the Indian population is largely vegetarian, not much attention has been given to the cultivation of vegetables, as compared to other crops like cereals, pulses and oil seeds. Therefore, the present study was conducted on two leafy vegetables, spinach (Spanacia oleracea L.) and methi (Trigonella foenum graecum L.) commonly grown in Aligarh, as the two popular vegetables of Indian diet....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015